This paper is an overview of
dataflow programming architectures. It traces the formation of current dataflow
thought from its origins to the present.
It is interesting to see the
two perspectives of research in this area. One perspective seeks to optimize a way to write a textual
or visual language to help with parallelization, determinism and writing
efficient code. The other
perspective seeks to write functional and efficient code. The former group works from a
perspective of theory, proof, and mathematical rigor, and the latter from a
perspective of practicality. The
paper makes reference to this at several points, and explicitly recognizes it
in the section on non-determinism:
The dichotomy in regard
to nondeterminism
appears to be the
result of a division
between those who wish
to use
dataflow as a means to
ease the formal
proof and analysis of
programs and those
who wish to use
dataflow for all varieties
of programming
problems. The former regard
determinacy as
essential, whereas
the latter regard the
provision of nondeterminacy
as essential.
It is unfortunate that
Max/MSP and Pd are conspicuously missing from the paper's discussion.
Useful statements:
0. The "no change" of value
requirement seems to be the hardest issue to get around in pure data flow
languages.
The best list of
features that constitute
a dataflow language was
put forward
by Ackerman [1982] and
reiterated
by Whiting and Pascoe
[1994] and Wail
and Abramson [1995].
This list includes
the following:
(1) freedom from side
effects,
(2) locality of effect,
(3) data dependencies
equivalent to scheduling,
(4) single assignment
of variables,
(5) an unusual notation
for iterations due
to features 1 and 4,
(6) lack of history
sensitivity in procedures.
Because scheduling is
determined from
data dependencies, it is
important that
the value of variables do
not change between
their definition and their
use.
1. It was realized that
the parallelism used
in dataflow
architectures operated at too
fine a grain and that
better performance
could be obtained
through hybrid von
Neumann dataflow
architectures. Many
of these architectures
[Bic 1990] took
advantage of more
coarse-grained parallelism
where a number of
dataflow instructions
were grouped and
executed in
sequence.
2. It is clear that dataflow
provides the
potential to provide a
substantial speed
improvement by
utilizing data dependencies
to locate parallelism.
3. about flow of data through a fork:
from input Y signify
that that value is to
be duplicated. Forking
arcs in this manner
is essential if a data
token is needed
by two different nodes.
A data token
that reaches a forked
arc gets duplicated
and a copy sent down
each branch. This
preserves the data
independence and the
functionality of the
system.
4. Alternate data flow: two
types of data flow are described.
In one the data is passed along from node to node and duplicated at
forks and merged or switched at joins.
In the other model, each node creates it's own representation of the
data, keeping its own copy.
In the structure
model, however, each
node creates only
one data object on each
arc that remains
there: the node builds
one or more data
structures on its
output arcs. It is possible
for these structures to
hold infinite
arrays of values,
permitting open-ended
execution, and creating
the same effect as
the token model, but
with the advantage
that the structure
model permits random
access of the data
structures and history
sensitivity.
and:
However, the structure
model has the
key disadvantage that
it cannot store data
efficiently. It
requires all data generated
to be stored, to
preserve the ability to examine
the history of the
program. Token
models are inherently
more efficient at
storing data.
5. Two approaches to execution activation: Data availability approach and the data demand approach. In data availability , each node fires
when data becomes available or changes.
In the demand approach, a node is fired when it receives a request for
data from one of its output arcs.
The paper describes a number
of visual data flow programming languages. LabView and ProGraph
Four research areas of data
flow:
Ñthe provision of
iteration in textual dataflow languages,
the problem is the nasty
requirement that variables only be assigned once. This makes an index variable difficult. The solution is to use the following
type of statement: new x = x + 1,
meaning that in the next instance, x will be x + 1.
Ñiteration structures in
DFVPLs,
The solution to this
problem that makes the most sense is the use of embedded iteration objects as
is implemented in LabView.
Ñthe use of data
structures: The main problem here is the issue where dataflow requires that a
data element only be assigned a value once. The solutions have been 1. using pointers and reference
counts to keep track of assignment changes, 2. I-structures where variables are created with undefined
values. Only variables that are
undefined may take on an assignment.
3. a hybrid of the two
where new copies of the variable are created when an assignment is made to a
variable with more than one reference.
Ñnondeterminism.
Addition of random merge
objects seems to be the solution of choice.